Genomic differences between sequence types 1 and 104 of Streptococcus suis Serotype 2

Background Streptococcus suis is a zoonotic pathogen that can cause invasive infections in humans who are in close contact with infected pigs or contaminated pork-derived products. S. suis serotype 2 sequence type (ST) 1 strains are mostly associated with meningitis, whereas ST104 strains are mostly recovered from sepsis cases in humans. No data are available for comparison of the ST1 and ST104 strains at the genomic level, particularly concerning virulence-associated genes. Thus, genomic comparison of both STs was performed in this study. Methods An ST1 isolate (ID26154) from the cerebrospinal fluid of a patient with meningitis and an ST104 isolate (ID24525) from the blood of a patient with sepsis were subjected to shotgun pyrosequencing using the 454 GS Junior System. Genomic comparison was conducted between the ST1 isolate and the ST104 isolate using the Artemis Comparison Tool (ACT) to identify the region of differences (RDs) between ST1 and ST104. Results Fifty-eight RDs were unique to the ST104 genome and were mainly involved in metabolism and cell functional activities, cell wall anchored proteins, bacteriophages and mobile genetic elements, ABC-type transporters, two-component signal transductions, and lantibiotic proteins. Some virulence genes mostly found in ST1 strains were also present in the ST104 genome. Whole-genome comparison is a powerful tool for identifying genomic region differences between different STs of S. suis serotype 2, leading to the identification of the molecular basis of virulence involved in the pathogenesis of the infection.

According to our previous reports, human S. suis serotype 2-ST1 strains are significantly associated with meningitis. In contrast, those belonging to ST104 are mostly related to non-meningitis cases, particularly sepsis (Kerdsin et al., 2011(Kerdsin et al., , 2018. ST1 belongs to CC1, whereas ST104, isolated in Thailand, belongs to CC104 (Kerdsin et al., 2011(Kerdsin et al., , 2018. As mentioned above, differences in clinical diseases caused by either ST1 or ST104 may be influenced by genetic backgrounds. No studies have compared the ST1 and ST104 strains at the genomic level, particularly concerning the region of sequence differences (RDs) that may contain virulence-associated genes, pathogenicity islands, and prophages, that may be involved in virulence or the pathogenesis of the infection. Thus, genomic comparison of both STs was performed in this study to understand the pathogenic potential of the strains, especially those ST104 that are more related to sepsis than meningitis.

Ethical approval
This study used strains obtained from human specimens. The study was reviewed and approved by the Ethics Committees of the Maharaj Nakhon Ratchasima Hospital, Thailand, from where the isolates were obtained (Memorandum no. 8/2564). The Ethics Committees waived the requirement for informed consent as the study did not require any personal patient data and satisfied the conditions of the policy statement on ethical conduct for research involving humans. This study was conducted according to the principles of the Declaration of Helsinki.
S. suis strains and DNA isolation S. suis serotype 2, ST1 (ID26154), and ST104 (ID24525) strains, isolated either from cerebrospinal fluid (CSF) or from a blood sample of human cases of meningitis or sepsis, respectively, were selected for whole genome sequencing. Both strains were cultivated on sheep blood agar at 37 C and 5% CO 2 . Genomic DNA was extracted using a QIAGEN DNAeasy Blood & Tissue Kit (Qiagen GmbH, Hilden, Germany) according to the manufacturer's instructions.

Genome sequencing and analysis
Both STs were subjected to shotgun pyrosequencing using a 454 GS Junior System (Roche). All operations were carried out according to the protocols provided by the manufacturer. The quality of reads was checked using the FASTX-Toolkit package (http:// hannonlab.cshl.edu/fastx_toolkit/). The sequencing reads were assembled into contigs using the GS de novo Assembler version 2.9 software (Roche, Basel, Switzerland). Genome sequences were submitted to the NCBI Prokaryotic Genome Annotation Pipeline (PGAP v4.12) for annotation.
The comparison of genome sequences was performed using the Artemis Comparison Tool (ACT) to identify the RDs between ST1 and ST104 (Carver et al., 2005). The term RDs refers to the sequences (after genomic comparison) present in the comparable strain but absent in the reference strain. In addition, the following S. suis strain sequences were used for comparative genomic analysis: P1/7, BM407, GZ1, SC84, 05ZYH33, 98HAH33, and 89/1591 (Table S1).

Distribution of ST104 RDs in different S. suis strains
Of the 58 RDs searched for their similarity to the sequences in the GenBank database based on the BLASTN tool, only 9 were unique to ST104 and they were selected for PCR primer design using the Primer-BLAST tool (https://www.ncbi.nlm.nih.gov/tools/primer-blast/ index.cgi?). The primer sequences for PCR screening of ST104-RDs are shown in Table 1. PCR tests targeting the nine selected RDs present in the ST104 strain were performed using the reference strains of the 29 S. suis serotypes, Streptococcus parasuis, Streptococcus ruminantium, Streptococcus orisratti (formerly S. suis serotypes 20, 22, 26, 32, and 34), and different S. suis strains isolated from humans in Thailand (Table 2).
PCR was performed using a total volume of 25 µl containing 1X JumpStart TM REDTaq Ò ReadyMix TM Reaction Mix (Sigma, St. Louis, MO, USA) and 0.4 µM of each primer ( Table 1). The PCR program consisted of an incubation for 3 min at 95 C, 30 cycles of 30 s at 95 C, 30 s at 56 C, and 3 min at 72 C, and a final extension for 5 min at 72 C. The PCR products were analyzed by 2% agarose gel electrophoresis in 0.5 × TBE buffer at a constant voltage of 100 Volt for 30 min (Mupid exU system, Takara, Tokyo, Japan). The gels were stained with ethidium bromide and then photographed on an ultraviolet illuminator (GeneGenius Bioimaging System, SynGene, Cambridge, United Kingdom). The sizes of the PCR products were compared with the GeneRuler TM 100 bp Plus DNA ladder (Thermo Fisher Scientific, Waltham, MA, USA) as the molecular size standard. S. suis P1/7 was used as a control of all PCR reactions.  Note: * Since these RDs were large fragments, two PCR reactions (2 pair primers) were required to efficiently amplify and to cover the whole RDs.

RESULTS AND DISCUSSION
General feature of S. suis serotype 2 genomes Sequencing of strain ID26154 (ST1) generated 150,365 reads with an average length of 390.14 bp (range = 40-739 bp), which assembled de novo into 36 non-redundant contigs using the GS de novo assembler program. The sequencing of strain ID24525 (ST104) generated 148,157 reads with an average length of 386.15 bp (range = 40-624 bp), which assembled de novo into 57 non-redundant contigs. The genomes of our ST1 and ST104 strains were 2,029,291 bp and 2,083,364 bp in length, respectively, with GC contents of 41.21 and 41.09, respectively. The strain ID26154 harbored 1,983 genes with 1,845 coding sequences, three rRNA genes, 42 tRNA genes, and four ncRNA genes. The strain ID24545 harbored 2,086 genes with 1,908 coding sequences, three rRNA genes, 41 tRNA genes, and four ncRNA genes.

RDs present in ST1 but absent in ST104 genomes
Comparison between the genomes of ST1 and ST104 showed many RDs present in the ST1 genome but absent in the genome of the ST104 strain. The RDs in the ST1 strain were identical to those previously reported for this ST (Table S2) (de Greeff et al., 2011;Zheng et al., 2011). Seventy RDs were identified in the ST1 genomes of the P1/7, BM407, GZ1, and ID26154 strains (Table S2). The size of RDs varied from 213 bp to 17.6 Kb. Of the 70 RDs, some of these genes were present in ST1 only, such as srtBCD cluster, revS, epf, and rgg. Six RDs had lower GC contents (<30% GC) than the ST1 (ID26154) chromosome, suggesting putative lateral transferred genes/sequences (Table S2).
Eight RDs (RDs 6, 14, 24, 33, 47, 57, 65, and 66) previously described to be associated with high pathogenicity in CC1, particularly ST1 and ST7, were present in the ST1 strain in the present study (Table S2; Zheng et al., 2011). Only the capsule polysaccharide locus, was present in the ST1 and ST104 strains because both strains belong to serotype 2. This Other ( confirmed that the ST1 strain used in the present study should present a virulence potential similar to that of other ST1 and CC1 members. In addition, analysis of the virulence-associated genes showed that many of them were present in the ST1 strain ID26154 (Table 3). Three proposed novel marker genes (SSU_RS03100: hypothetical protein, SSU_RS09155: hypothetical protein, and SSU-RS09525: RNA-binding protein) for predicting the pathogenic pathotype of S. suis were also present in our ST1 strain (100% identity), suggesting these are also present in Asian or Eurasian strains (Table 3; Estrada et al., 2022). However, further study needs to be evaluated for these marker genes with S. suis strains from different geographical origins.

RDs present in ST104 but absent in ST1 genomes
Comparison between the genomic sequences of ST104 (ID24525) with the ST1 (ID26154, P1/7, BM407, GZ1), and ST7 (SC84, 98HAH33, 05ZYH33) strains revealed 58 RDs that were present in the ST104 genome; these RDs were absent in the ST1 and ST7 genomes. These RDs contained genes involved in carbohydrate and amino acid metabolism, DNA replication and recombination, metal resistance, cell wall anchored protein with and without C-terminal cell wall sorting signal (CWS) of the LPXTG motif, bacteriophages and other mobile genetic elements, ABC-type transporters, two-component signal transduction, transcription regulators, CRISPR-Cas proteins, restriction endonucleases, and lantibiotic proteins (Table S3). However, 17 RDs in the ST104 genome were also present in the ST25 genome (strain 89-1591) (Table S3). Among these 58 RDs, the surface or cell wall anchored proteins containing the CWS of ST104 were different from the ST1 and ST7 strains. These included five RDs that were composed of two cluster genes and four individual genes coding for two class C sortase family protein genes in RD2 (srtG cluster; 37.57% GC) and RD30 (srtF cluster; 39.4% GC), one LPXTG motif cell wall-anchored protein gene in RD23 (37.96% GC), two LPXTG motif cell wall surface protein genes, such as collagen adhesin in RD39 (40.41% GC), and one muramidase-release-like protein gene (mrp-like) in RD52 (38.86% GC) (Table S3 and Fig. 1). These surface proteins may be important for human-pathogen interactions, such as adhesion to or invasion of host cells, adhesion to the extracellular matrix, (ECM) and binding to specific immune system components; they may an important role in the initial colonization steps rather than being involved in systemic infection (Navarre & Schneewind, 1999;Schneewind & Missiakas, 2014). More details of these surface proteins in the five RDs are described in the section on surface protein genes in ST104 (below).

PCR detection of ST104-RDs distribution in different S. suis strains
Following the genomic comparison of the ST104 genome with those of the ST1 and ST7 strains, we selected nine candidate RDs (among the 58 RDs) exclusively found in the ST104 strain in this study to evaluate their presence in other S. suis strains, by designing PCR primer pairs to amplify each selected RD. As shown in Table 2, only RD23 (37.96% GC) and RD50 (30.3% GC) were present in all ST104 strains but generally absent in ST1 serotype 2 strains; however, they were present in those belonging to other STs. The ST233 (CC233/379) strains contained six RDs similar to ST104, which is closely related to ST104, according to MLST analysis (Kerdsin et al., 2018). Nine selected RDs, except for RD2 and RD57, were mostly present in ST104 (also ST233) strains rather than in other STs of S. suis serotypes 2 and 14 (Table 2). However, some CC1 (STs 1, 105, 127, and 237) strains harbored ST104-RDs, as shown in Table 2. Therefore, in certain CC1 strains, either ST104 RDs-homolog or the PCR-targeted region (not whole RDs) may be present. RD2 containing the srtG gene cluster of class C sortase was widely distributed in ST28, ST25, ST103 strains of serotype 2, and in other serotypes (serotypes 1/2, 3, 9, 11, 12, 19, 29, and 30). In addition, a zot-motif protein gene in RD57 was found in ST1, ST28, and ST25 of serotype 2, in ST105, ST127, and ST237 of serotype 14, and in serotypes 3, 7, 9, 16, 17, and 19 (Table 2).

Surface protein genes in ST104
Surface protein genes encode proteins with LPXTG or related motifs. LPXTG-motif proteins have been involved in the binding of extracellular matrix proteins and adhesion to host cells (Baums & Valentin-Weigand, 2009;Gottschalk et al., 2010). Many proteins with an LPXTG motif have also been suggested to be putative virulence factors, such as the muramidase-released protein (MRP), sortases, surface protein 1 (SAO), and opacity factor (OFS) (Gottschalk et al., 2010).
As previously mentioned, in the ST104 strain analyzed in the present study, five RDs containing genes encoding proteins with an LPXTG motif were described ( Fig. 1 and Table S3). When tested using PCR, these genes were present in ST104 strains; however, they were absent in almost all ST1 strains and some other strains (Table 2). These genes were the srtG gene cluster (RD2), srtF gene cluster (RD30), Gram-positive anchor protein gene (RD23), collagen adhesin and the LPXTG-motif protein with T surface-antigen of pili (RD39), and MRP-like protein (RD52).
The srtG gene cluster (RD2) was also found in the ST25, ST28, ST103, and ST233 strains of serotype 2, and in the strains of serotypes 1/2, 3, 9, 11, 12, 19, 29, and 30 (Table 2). However, this cluster was absent in all tested strains of ST1, as shown in Table 2. This gene cluster is composed of srtG-encoded sortase, sgp1, and sgp2-encoded putative pilin subunits (Fig. 1). This structure was similar in S. suis strain 89-1591 (ST25) and was predicted to encode putative pili (Takamatsu et al., 2009). In addition, several S. suis isolates from humans and diseased pigs had the srtG cluster (Takamatsu et al., 2009). The function of the srtG cluster-mediated pili is still unknown; however, this gene is expressed at a high level when bacteria are grown at <30 C (Okura et al., 2011). The surface temperature of different external body parts of pigs ranged from 20 C to 30 C when the environmental temperature was approximately 20 C. This pilus may be involved in the interaction of the bacterium with the host surface components during the first steps of infection (Okura et al., 2011).
Four individual LPXTG-motif surface protein genes were found in three RDs: RD23, RD39, and RD52. In RD23, a Gram-positive anchor protein gene was identified according to its protein motif using Pfam database identification. This protein has an identity of 72% to the hypothetical protein SSUST1_0363 of S. suis serotype 1 (accession no. AER20776), 59% to the hypothetical protein SSUR61_1521 of S. suis strain R61 (unknown serotype accession no. EHC02107), 47% to a Gram-positive anchor of Streptococcus infantis SK970 (accession no. EGV04462), and 45% to the surface exclusion protein PrgA of Streptococcus oralis ATCC 35037 (accession no. ZP07640015). The function of this gene or protein is yet unknown; however, the properties of proteins in the superfamily of Gram-positive anchor proteins suggest that it may be associated with host cell adhesion and invasion (Fischetti, Pancholi & Schneewind, 1990).
RD39 consisted of two LPXTG-motif protein genes, collagen adhesin (or collagen-like protein) and an LPXTG-motif protein containing T-surface antigen of pili based on Pfam database identification. Collagen adhesin is widely present in pathogenic streptococci, including Streptococcus pyogenes, Streptococcus agalactiae, Streptococcus pneumoniae, and Streptococcus equi (Abranches et al., 2011). The function of collagen adhesin is to bind to ECM, collagen, and laminin, and adherence to human coronary artery endothelial cells, as demonstrated in Streptococcus mutans (Abranches et al., 2011). In S. pyogenes, the collagen adhesin promotes adhesion and biofilm formation and decreases bacterial killing by neutrophil extracellular traps in tissues (Lukomski et al., 2017). It is also a potential risk factor for hemorrhagic stroke (Nakano et al., 2011). The LPXTG-motif protein containing T-surface antigen of pili has a region similar to the fctA gene of S. pyogenes, which encodes surface proteins, including fibronectinand collagen-binding proteins and serological markers known as T antigens, which give rise to pilus-like appendages (Lizano, Luo & Bessen, 2007). In S. pyogenes, pili play a role in the adherence and colonization of human tissues (Manetti et al., 2007).
RD52 is composed of two muramidase-release-like protein genes (MRP-like), which show different similarity. The first MRP-like sequence had high similarity to the MRPs of Gemella haemolysans ATCC10379, Granulicatella elegans ATCC700633, Streptococcus vestibularis ATCC49124, and Streptococcus mitis NCTC12261 with 53%, 50%, 50%, and 49%, respectively. The second MRP-like sequence was similar to the MRP sequence of S. suis, with an identity range of 38-42%. Because of the low similarity with real S. suis MRP, the primers used in the PCR tests for S. suis MRP could not detect the MRP-like gene in S. suis ST104 strains (Silva et al., 2006;Kerdsin et al., 2018).
However, five putative virulence genes, salK/salR, srtBCD gene cluster, revS, rgg, and epf, described as being present in the ST1 or ST7 strains, were absent in ST104 (Fittipaldi et al., 2012;Takamatsu et al., 2009). ST104 strains also lack an additional putative virulence factor, such as SSU0835 (an ABC-type multidrug transport system), which has been described as being involved in the invasion of porcine brain microvascular endothelial cells (Vanier et al., 2009). In addition, S. suis can cross the blood-brain barrier, causing meningitis under the action of suilysin, which is cytotoxic to the brain microvascular endothelial cells (Fittipaldi et al., 2012). ST104 strains failed to develop high levels of meningitis in a mouse model due to low or no production of suilysin, due to a negligible level of transcription of the sly gene and undetectable sly promoter activities (Takeuchi et al., 2014). The lack of putative virulence factors may explain why ST104 caused less meningitis than sepsis.
In addition, Dong et al. (2021) demonstrated the HAC diversified from 1,634 S. suis isolates. This HAC is strongly associated with human infections. Among 25 HAC-specific marker genes, two selected genes (G15: sigma-70 and G20: relaxase mobilization nuclease protein) were specific to HAC with 12 training HAC isolates, 21 human isolates, and 10 low-virulence pig isolates, as described elsewhere (Dong et al., 2021). Analysis of these two HAC marker genes in our ST104 revealed that they were absent in the genome. This may indicate that ST104 may not belong to HAC or these two marker genes may not be appropriate for human S. suis strains from different geographical regions. However, this strain was isolated from a human patient, suggesting it is pathogenic and it might be related to HAC. More ST104 strains should be extensively analyzed in further studies.
On the other hand, our ST1 strain contained these two HAC marker genes (100% identity), suggesting it belonged to HAC.

CONCLUSION
This study demonstrated the difference in the genomes of ST1 and ST104 strains of S. suis serotype 2 isolated from humans. 58 RDs were unique to the ST104 genome; they were involved in metabolism and cell functional activities, cell wall anchored proteins, bacteriophages and mobile genetic elements, ABC-type transporters, two-component signal transductions, and lantibiotic proteins. Some virulence genes found in ST1 were also present in ST104 cells, suggesting the potential virulence of the latter. Further studies using ST104 strains with in vitro and in vivo models will help to explain why this ST is mostly involved in sepsis rather than meningitis.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
This study was supported by the Japan Society for the Promotion of Science (JSPS) and a Grant-in-Aid for Scientific Research (B: 21406027), and the Office of the Ministry of Higher Education, Science, Research and Innovation and the Thailand Science Research and Innovation through the Kasetsart University Reinventing University Program 2021. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.